Project: Sarcasm Detection

Install Tensorflow2.0

Get Required Files from Drive

## Reading and Exploring Data

Read Data "Sarcasm_Headlines_Dataset.json". Explore the data and get some insights about the data. ( 4 marks)

Hint - As its in json format you need to use pandas.read_json function. Give paraemeter lines = True.

Drop article_link from dataset. ( 2 marks)

As we only need headline text data and is_sarcastic column for this project. We can drop artical link column here.

Get the Length of each line and find the maximum length. ( 4 marks)

As different lines are of different length. We need to pad the our sequences using the max length.

## Modelling

Import required modules required for modelling.

Set Different Parameters for the model. ( 2 marks)

Apply Keras Tokenizer of headline column of your data. ( 4 marks)

Hint - First create a tokenizer instance using Tokenizer(num_words=max_features) And then fit this tokenizer instance on your data column df['headline'] using .fit_on_texts()

Define X and y for your model.

Get the Vocabulary size ( 2 marks)

Hint : You can use tokenizer.word_index.

## Word Embedding

Get Glove Word Embeddings

Get the Word Embeddings using Embedding file as given below.

Create a weight matrix for words in training docs

Create and Compile your Model ( 7 marks)

Hint - Use Sequential model instance and then add Embedding layer, Bidirectional(LSTM) layer, then dense and dropout layers as required. In the end add a final dense layer with sigmoid activation for binary classification.

Fit your model with a batch size of 100 and validation_split = 0.2. and state the validation accuracy ( 5 marks)

Model -2 (Only Learning parameter is added while keeping all other things same as above model)

Model-3: Inclusion of Average Pooling layer

let's check my model is capable enough or not to predict a given sentence as a sarcastic sentence or not.

The End